Reduce boot-time memory fragmentation
On certain NUMA configurations having init_node_heap() consume the
first few pages of a new node's memory for internal data structures
leads to unnecessary memory fragmentation, which can - with
sufficiently many nodes - result in there not remaining enough memory
below 4G for Dom0 to set up its swiotlb and PCI-consistent buffers.
Since alloc_boot_pages() generally consumes from the end of available
regions, make init_node_heap() prefer the end of such regions too (so
that fragmentation occurs at only one end of a region).
(Adjustment from first version: Use the tail of the region when the
end addresses alignment is less or equal to the beginning one's, not
just when it's less.)
Further, in order to prefer allocations from higher memory locations,
insert memory regions in reverse order in end_boot_allocator(), with
the exception of inserting one region residing on the boot CPU's node
first (for the statically allocated structures - used for the first
node seen - to be used for this node).
Finally, reduce MAX_ORDER on x86 to the maximum useful value (1Gb), so
that the reservation of a page on node boundaries (again leading to
fragmentation) can be avoided as much as possible (having node
boundaries on less the 1Gb aligned addresses is expected to be rare,
if found in practice at all).
Signed-off-by: Jan Beulich <jbeulich@novell.com>